A few months ago, I adopted two male kittens from the Idaho Falls Animal Shelter. From seeing the listing to actually getting to take them home felt like a life time of waiting.
According to the Animal Human Society1, most cats are in new homes within 14 days from intake to adoption on average. Additionally, the East Dallas Veterinary Clinic2 states that male cats tend to be more sociable and playful in comparison to their female counter parts. Based off these assumptions, we will be using a simple logistic regression to answer the following question:
What is the probability that the cat is a male given how many days it took for them to get adopted?
Click the Show Data tab below to learn more about
the data and its collection.
This data was gathered off the Idaho Falls Animal Shelter’s Facebook page. On their page, they post various cats along with information such as their age (Months), gender (1 = Male, 0 = Female), the day they are available for adoption, as well as the day those said cats get adopted. Therefore, I recorded these characteristics in the table below. I mutated the classification of gender into binary values in order to conduct the logistic regression test, as well as adding in a column that subtracted the day they were posted from the day they were adopted to get the waiting period from listing to homing.
adoptions <- catadoptions %>%
mutate(Gender = ifelse(Sex == "M", 1, 0)) %>%
select(Age, Gender, Posted, Adopted, DaysListed)
datatable(adoptions, options=list(lengthMenu=c(3,10,30)))
The probabiltity of the gender of the cat being a male based on how many days it took for them to get adopted is shown by the logistic regression model below:
\[P(Y_i = 1 | x_i ) = \frac{e^{\beta_0+\beta_1\text{Days Listed}}}{1+e^{\beta_0+\beta_1\text{Days Listed}}} = \pi_i\]
| Part | What it means |
|---|---|
| \(Y_i\) (Responsive variable) | Describes the probability the \(Y_i = \text{Male (1) or Female (0)}\) for a given scenario. |
| \(x_i\) (Explanatory variable) | Describes the observed value of the number of days an individual cat was listed. |
Our \(\beta_1\) value focuses on the proportional change in the odds that \(Y_i = 1\) (or that the cat adopted is male) for every day that goes by. Thus, our hypotheses are stated as so to capture if the slope is or is not significant to effecting the probability that the gender of the cat is male:
\[H_0 : \beta_1 = 0\]
\[H_a : \beta_1 \neq 0\]
Additionally, our level of significance will be:
\[\alpha = 0.05\]
To visualize the wait a cat has depending on their gender, we will use a scatter plot. The values listed as 1 are the male cats and the 0 are the female cats. The gradient of the dots shows just how long a specific cat had to wait from getting listed to being adopted. The lighter the dot the shorter the wait (0-5 days), and the darker the dot the longer the wait(>20 days). Looking at the slope of the line, it seems thatas the days go on the probability of the cat being male decreases.
However, the curve of the plot does not approach 0 or 1 within the range of the data. Therefore, the logistic regression is not very useful for this data.
catgraph <- ggplot(data=adoptions, aes(x=DaysListed, y=Gender, color = DaysListed)) +
geom_point(size =4, alpha = 0.8, aes(
text = paste(
"Age (Months):", Age, "<br>",
"Listed:", DaysListed, "days<br>"))) +
geom_smooth(method = "glm",
method.args = list(family="binomial"), se=FALSE, color = "steelblue1")+
scale_color_gradient(low = "steelblue1", high = "royalblue4") +
labs(title = "Idaho Falls Animals Shelter Cat Adoptions",
x = "Days Listed",
y = "Gender (Male = 1, Female = 0") +
theme_minimal() +
theme(
panel.background = element_rect(fill = "white", color = NA),
plot.background = element_rect(fill = "white", color = NA)
)
ggplotly(catgraph, tooltip="text")
Before we dive into our logistic regression test, we will need to confirm the assumption of if the model appears to fit the data well or not. To see if there is evidence that the above logistic regression is NOT a good for the data, we will use the Deviance Goodness of Fit Test.
Click the Show Deviance Goodness of Fit Test tab to
see the test results
The hypotheses of the Deviance Goodness of Fit Test are shown below:
\[H_0 : \pi_i = \frac{e^{\beta_0 + \beta_1x_i}}{1 + e^{\beta_0 + \beta_1x_i}}\]
\[H_a : \pi_i \neq \frac{e^{\beta_0 + \beta_1x_i}}{1 + e^{\beta_0 + \beta_1x_i}}\]
The null hypothesis states that the data is a good fit (useful for the data), while the alternative hypothesis states that the data is not a good fit(not useful for the data).
The p-value of this test is calculated below:
\[\text{P-value} = 0.0051 > \alpha\] With this low p-value, we can conclude that that we have sufficient evidence to reject the assumption that the logistic model is NOT a good fit for this data. This was a trend that was shown on the graph due to the curve of the plot does not approach 0 or 1 within the range of the data.
Now that we have checked the fitness of the data, we will now conduct our logistic regression test. The results are shown below:
glm.adoption <- glm((Gender>0) ~ DaysListed, data= adoptions, family=binomial)
pander(summary(glm.adoption))
| Estimate | Std. Error | z value | Pr(>|z|) | |
|---|---|---|---|---|
| (Intercept) | 0.5418 | 0.2365 | 2.291 | 0.02197 |
| DaysListed | -0.1044 | 0.03522 | -2.963 | 0.003046 |
(Dispersion parameter for binomial family taken to be 1 )
| Null deviance: | 207.9 on 149 degrees of freedom |
| Residual deviance: | 195.9 on 148 degrees of freedom |
Since our hypotheses are specifically looking at the slope ( \(\beta_1\)), we will only pay
attention to the p-value of the DaysListed.
Therefore, our results are:
\[\text{P-value} = 0.003046 <
\alpha\]
With a low p-value, we are given sufficent evidence to conclude that Using our logistic regression results, our mathematical model looks like the following:
\[P(Y_i = 1 | x_i ) \approx \frac{e^{0.5418+(-0.1044)x_i}}{1+e^{0.5418+(-0.1044)x_i}} = \hat{\pi}_i\]
With this updated model, it can help us calculate certain X- values where the probability that the cat adopted is male.
In order to interpret the effect of the amount of days a cat is listed on the odds, we must calculate the odds of a success or the odds of the cat listed on the adoption board is male (when \(Y_i\) = 1). We will use \(\beta_1\), as it is the focus of our hypothesis, to help us see how much of a percentage of change that takes place for every day that goes past. The calculation is made below.
| Calculations | Interpretation |
|---|---|
| \(e^{\beta_1} = e^{-0.1044} \approx 0.9008649\) | For every day that a cat is listed, the odds of that cat being male decreases by approximately 10% (since 1- 0.901 = 0.099) |
With this discovery, it contradicts our initial idea in the background section suggesting that male cats might be more adoptable.
As said in the background information, a cat will have an average wait of 14 days from when they are taken into the shelter to being put into their new home. To determine what the probability of that cat being a male, we will use the following model:
\[P(Y_i = 1 | x_i ) \approx \frac{e^{0.5418+(-0.1044)*14}}{1+e^{0.5418+(-0.1044)*14}} = \hat{\pi}_i\] \[\hat{\pi}_i \approx 0.2849986\]
With this calculation, this shows that there is a 28.5% chance that a cat that is listed for 14 days is male at the Idaho Falls Animal Shelter.
This calculation could be interpreted as more male cats get adopted sooner than female cats, the older cats are mainly female, and or there are more female cats occupying the Idaho Falls Animal Shelter. A suggestion to get more female cats adopted would be to advertise them more on their Facebook where they list the cats or create more opportunities for adoption such as adoption events to promote more female cats to find their fur-ever home.